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Abstract 

We propose a probabilistic formulation that enables sequential detection of multiple 
change points in a network setting. We present a class of sequential detection rules for cer- 
tain functionals of change points (minimum among a subset), and prove their asymptotic 
optimality properties in terms of expected detection delay time. Drawing from graphical 
model formalism, the sequential detection rules can be implemented by a computation- 
ally efficient message-passing protocol which may scale up linearly in network size and in 
waiting time. The effectiveness of our inference algorithm is demonstrated by simulations. 

1 Introduction 

Classical sequential detection is the problem of detecting changes in the distribution of data 
collected sequentially over time [2]. In a decentralized network setting, the decentralized 
sequential detection problem concerns with data sequences aggregaged over the network, while 
sequential detection rules are constrained to the network structure (see, e.g., [21 IU El E] ) . 
The focus was still on a single change point variable taking values in (discrete) time. In this 
paper, our interests lie in sequential detection in a network setting, where multiple change 
point variables may be simultaneously present. 

As an example, quickest detection of traffic jams concerns with multiple potential hotspots 
(i.e., change points) spatially located across a highway network. A simplistic approach is to 
treat each change point variables independently, so that the sequential analysis of individual 
change points can be applied separately. However, it has been shown that accounting for the 
statistical dependence among the change point variables can provide significant improvement 
in reducing both false alarm probability and detection delay time [8]. 

This paper proposes a general probablistic formulation for the multiple change point prob- 
lem in a network setting, adopting the perspective of probabilistic graphical models for multi- 
variate data [9]. We consider estimating functionals of multiple change points defined globally 
and locally across the network. The probablistic formulation enables the borrowing of statis- 
tical strengh from one network site (associated with a change point variable) to another. We 
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propose a class of sequential detection rules, which can be implemented in a message-passing 
and distributed fashion across the network. The computation of the proposed sequential rules 
scales up linearly in both network size and in waiting time, while an approximate version 
scales up constantly in waiting time. The proposed detection rules are shown to be asymp- 
totically optimal in a Bayesian setting. Interestingly, the expected detection delay can be 
expressed in terms of Kullback-Leibler divergences defined along edges of the network struc- 
ture. We provide simulations that demonstrate both statistical and computational efficiency 
of our approach. 

Related Work. The rich statistical literature on sequential analysis tends to focus almost 
entirely on the inference of a single change point variable [21 [10]. There are recent formulations 
for sequential diagnosis of a single change point, which may be associated with multiple 
causes or multiple sequences [12] . Another approach taken in [13] considers a change 
propagating in a Markov fashion across an array of sensors. These are interesting directions 
but the focus is still on detecting the onset of a single event. Graphical models have been 
considered for distributed learning and decentralized detection before, but not in the sequential 
setting 115] . This paper follows the line of work of [H [16], but our formulation based on 
graphical models is more general, and we impose less severe constraints on the amount of 
information that can be exchanged across network sites. 

Notation. We will use P to denote densities w.r.t. some underlying measure (usually under- 
stood from the context), while P is used to denote probability measures, [d] denotes the set of 
integers {1, ... , d}. For a real-valued function / defined on some space, ||/||oo := sup^, \f(x)\ 
denotes its uniform norm. In an undirected graph, the neighborhood of a node i is denotes 
as di. 

2 Graphical model for multiple change points 

In this section, we shall formulate the multiple change point detection problem, where the 
change point variables and observed data are linked using a graphical model. Consider a 
sensor network with d sensors, each of which is associated with a random variable Xj G N, for 
j 6 [d] := {1, 2, ... , d}, representing a change point, the time at which a sensor fails to function 
properly. We are interested in detecting these change points as accurately and as early as 
possible, using the data that are associated with (e.g., observed by) the sensors. Taking a 
Bayesian approach, each Xj is independently endowed with a prior distribution vrj(-). 

A central ingredient in our formalism is the notion of a statistical graph, denoted as 
G = (V,E), which specifies the probabilistic linkage between the change point variables and 
observed data collected in the network (cf. Fig. [1]). The vertex set of the graph, V = [d] 
represents the indices of the change point variables Xj. The edge set E represents pairings of 
change point variables, E = {e = {s\,S2} \ s±,S2 6 V}. With each vertex and each edge, we 
associate a sequence of observation variables, 



where the superscript denotes the time index. The Xj models the private information of node 
j, while X e models the shared information of nodes connected by e. We will use the notation 



Xj = (xj,x],...), jev, 

X e = (Xl,X 2 e ,...), eeE, 
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Figure 1: Left panel illustrates a statistical graph, which induces a graphical model in the 
middle panel. Right panel illustrates statistical messages passed at time n along some edges 
in a communication graph (which coincides with statistical graph in this case). 

X™ = (Xj, . . . , X") and similarly for X"; notice the distinction between X™, the observation 
at time n, versus bold X™, the observations up to time n, both at node j. The aggregate of 
all the observations in the network is denoted as X* = G V, X e ,e G E). Similarly, X™ 

represents all the observations up to time n. We will also use A* = (Xj, j G V). 
The joint distribution of A* and X" is given by a graphical model, 

p(a*,x?) = n ^(Aj) n p(xyiA,-) n wia s1 , a S2 ). 

Given Aj = /c, we assume Xj,..., X^ 1 to be i.i.d. with density gj and Xj\ Xj +1 , ... to be 
i.i.d. with density fj. Given (X Sl , A S2 ), we assume that the distribution of X™ only depends on 
A e := A S1 A A S2 , the minimum of the two change points; hence we often write P(X" |A e ) instead 
of P(X™ |A Sl , A S2 ). Given A e = k, X\, . . . ,X^~ X are i.i.d. with density g e and X^,X^ +1 , . . . 
are i.i.d. with density f e . All the densities are assumed to be with respect to some underlying 
measure /U. These specifications can be summarized as, 

fe— 1 n 

POq\\ ] ) = \[g J {X])\{f J (X)) (4) 

t=l t=k 

and similarly for P(X"|A e ). We will assume the prior on Aj to be geometric with parameter 
Pj G (0, 1), i.e. TTj(k) := (1 — Pj) k ~ 1 Pj, for k G N. Note that these change point variables are 
dependent a posteriori, despite being independent a priori. 

2.1 Sequential rules and optimality 

Although our primary interest is in sequential estimation of the change points A* = (Xj), we 
are in general interested in the following functionals, 

4> := <f>(X*) := A s := min A,-. (5) 

jes 

for some subset S C [d]. Examples include a single change point S = {j}, the earliest among a 
pair § = {i, j} and the earliest in the entire network § = [d]. Let 3~ n = <r(X") be the c-algebra 
induced by the sequence X™. A sequential detection rule for </> is formally a stopping time r 
with respect to filtration (3" n )n>o- To emphasize the subset S, we will use r§ to denote a rule 
when the functional <f> = A§. For example t\ is a detection rule for Ai and t±2 is a rule for 
A12 = Ai A A2- 
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In choosing r, there is a trade-off between the false alarm probability F(r < 0) and 
the detection delay E(r — </>)+. Here, we adopt the Neyman- Pearson setting to consider all 
stopping rules for eft, having false alarm at most a, 

A (a) := {r : P(r < 0) < a}, (6) 

and pick a rule in A^ that has minimum detection delay. 

2.2 Communication graph and message passing (MP) 

Another ingredient of our formalism is the notion of a communication graph representing 
constraints under which the data can be transmitted across network to compute a particular 
stopping rule, say tj. In general, such a rule depends on all the aggregated data X". We 
are primarily interested in those rules that can be implemented in a distributed fashion by 
passing messages from one sensor only to its neighbors in the communication graph. Although, 
conceptually, the statistical graph and communication graphs play two distinct roles, they 
usually coincide in practice and this will be assumed throughout this paper. See Fig. [T] for an 
illustration. 



3 Proposed stopping rules 

In general, we suspect that obtaining strictly optimal rules in closed form is not possible 
for the multiple change point problem introduced earlier; more crucially such rules are not 
computationally tractable for large networks. In this section, we shall present a class of 
detection rules that scale linearly in the size of the network, d, and can be implemented in a 
distributed fashion by message passing. 

Consider the following posterior probabilities 

^(k):=¥(X § = k\X:), (7) 

n 

tf[n]:=P(Ag <n|X?)=£tf(A0. (8) 

k=l 

We propose to stop at the first time 7§[n] goes above a threshold, 

r s = inf{n G N : j§[n] > 1 - a} (9) 

where a is the maximum tolerable false alarm. It is easily verified that these rules have a 
false alarm at most a. 

Lemma 1. For (f> = X§, the rule r§ € A^(a). 

More interestingly, we will show in Section [J] that r§ is asymptotically optimal for detecting 
As- First, let us look at two message-passing (MP) implementations of the stopping rule ([9]). 
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3.1 Exact message passing algorithm 

It is relatively simple to adapt the well-established belief propagation algorithm, also known 
as sum-product, to the graphical model ([3j) . The algorithm produces exact values of the 
posterior 7?, as defined in ([7]), in the cases where G is a polytree (and provides a reasonable 
estimate otherwise.) In this section, we provide the details for S = {j} or S = {i,j} £ E. 

One issue in adapting the algorithm is the possible infinite support of 7?. Thanks to 
a "constancy" property of the likelihood, it is possible to lump all the states after n when 
computing 7? [n]. 

Lemma 2. Let {i\,i 2 , ■ ■ ■ >*»■} C [d] be a distinct collection of indices. The function 

(h,k 2 , ...,hr)\-> P(X"|Aii = ^2 = k 2 ,. . . , Xi r = k r ) 
is constant over {n + 1, n + 2, . . . } r . 

See Appendix [A] for the proof. The algorithm is invoked at each time step n, by passing 
messages between nodes according to the following protocol: a node sends a message to one of 
its neighbors (in G) when and only when it has received messages from all its other neighbors. 
Message passing continues until any node can be linked to any other node by a chain of 
messages, assuming a connected graph. For a tree, this is usually achieved by designating a 
node as root and passing messages from the root to the leaves and then backwards. 

The message that node j sends to its neighbor i, at time n, is denoted as = [m^(l), . . . , m™j(n+ 
1)] € E n+1 and computed as 

n+l 

™W = £ {^(k')P(X.J\k')P(X? j \k A k') I] m^.(fe')} (10) 

k>=l r£dj\{i} 

for k S [n + 1] , where 

{TTj(k) for k G [nl 
j\ j (ll) 

N c = E£n+i for fc = n + 1. 

and dj is the neighborhood set of j. Once the message passing ends, 7™ and 7^ are readily 
available. We have 

7™(fc) oc n*(k) P{iq\k) Y[m^(k), ke[n}. (12) 

It also holds for k = n + 1 if the LHS is interpreted as 7™[n] c . 

The same messages can be used to compute (lj(ki,k 2 ) := P(Aj = fci,Aj = A^jX") for 
{i,j} G -E 1 . We have 

vsto.to) n n (13) 

redi\{j} r£dj\{i} 



where 



Wi(ki,k 2 ) :=^(A; 1 )^(fc 2 )P(X™|A ;i )P(X-|A :2 )P(X".|A :i Ak 2 ) (14) 



for (ki,k 2 ) £ [n] 2 , from which 7^ can be computed. 

Let us summarize the steps of the message passing algorithm in the case of a tree: 
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Message passing algorithm to compute the posteriors 7™ [n] and 7™- [n] 
At time each time n: 

1. Designate a node of the tree, say node 1 as root and direct the edges to point away from 
root. 

2. Initialize messages m™j E (one for each directed edge j — > i) to the all ones vector. 
Compute TTj'(k) for k G [n + 1], j E [d] according to (fTTj) . 

3. Pass messages from a node jf to each of its descendants i (that is, i E dj for which 
j — > i is a directed edge.) according to equation (fTU|) . Do this, recursively, starting from 
root (j = 1) until you reach each of the leaves. 

4. Reverse the direction of the edges and repeat Step 3, this time starting from leaves 
and ending once you reached the root. In computing m"j based on (flOl) . use messages 
computed in the previous step. 

5. Compute 7™(fe) for fe E [n + 1] based on (fT2j) and normalize so that Efc=i 7jK^) = 1- 
Let 7?[n]=£Li -#(*)• 

6. Compute CJJK&Ij ^2) for (fei, fc 2 ) G + I] 2 based on (fT3]) and (fT4"|) and normalize so that 
E*£ CS(*i. *2) = L Let 75 W == E3f 1= i Efc 2 =i fe 2 ). 



We have the following guarantee which is a restatement of a well-known result for belief 
propagation [T7] : 

Lemma 3. When G is a tree, the message passing algorithm above produces correct values of 
7™ and 7 4 ™ at time step n, with computational complexity 0{{\V\ + |-E|)n). 

4 Asymptotic optimality of MP rules 

This section contains our main result on the asymptotic optimality of stopping rule ([9]). To 
simplify the statement of the results, let us extend the edge set to E := E U {{j} : j G V}. 
This allows us to treat the private data associated with node j, i.e. Xj, as (shared) data 
associated with a self-loop in the graph (V, E). For any e 6 E, let I e := f f e log j 2 - dfj, be the 
KL divergence between f e and <? e . For 4> = A§, let 

J := / As := £ J e (15) 

eCS 

where the sum runs over all e G -E which are subsets of S. For example, for a chain graph 
on {1, 2, 3} with node 2 in the middle, E = {{1, 2}, {2, 3}, {1}, {2}, {3}} and we have J Aia := 
A + -^2 + A 2 while Jx 13 := /1 + 13. (Here, we abuse notation to write J12 instead of -^{1.2} an d 
so on.) 

Recall the geometric prior on A,- (with parameter pj) and the definition of (f> = Ag as 
the minimum of Xj,j G S. Then, is geometrically distributed a priori with parameter 
l_ e -^ := l_ n . 6S (l- Pi ). 

We can now state our main result on asymptotic optimality. 
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9e 

for {Aj}. Then, t§ is asymptotically optimal for <fi = X§; more specifically, as a — > 0, 



Theorem 1. (Optimal delay) Assume pog^Hoo < M for all e 6 E, and geometric priors 



E[r s -A s |r s >A s ] = ^f" 1 (1 + o(l)) 

= inf E\t- A§ I r> Agl. 

rGA (a) ' J 

Remark 1. Let us highlight some particular cases of interest in this result. To simplify 
notation, let ~p~j := 1 — pj. 

• For <p = Ai A • • • A Ad (the minimum of all the change points), the asymptotic optimal 
delay is 

I log a | 



- Y. je v l °S Pj + J2 je v T i + Sees J e 
For <fi = Aj A Xj , the asymptotic optimal delay is 

I log Oi\ 



(l + o(l)) 



log Pi - log +h + Ij + Iijl{{i,j}eE} 



where l^i^ & E} ls an indicator function, i.e., equal to 1 if is an edge and zero 

otherwise. 



For 4> = Aj, the asymptotic optimal delay is 

I log a\ 



- log Pi + I, 



(l + o(l)) 



Remark 2. A particular feature of the asymptotic delay is the decomposition (I15p of infor- 
mation along the edges of the graph. This is more clearly seen in the case of a paired delay 
4> = Xij, for which the information 1$ = Ii + Ij + Iijl{{i,j}eE} increases (hence the asymptotic 
delay decreases) if there is an edge between nodes i and j. This has no counterpart in the 
classical theory where one looks at change points independently. 

Remark 3. Another feature of the result is observed for a single delay, say (ft = X\, where one 
has I<p = I\ regardless of whether there are edges between node 1 and the rest of the nodes. 
Thus, the asymptotic delay for the threshold rule which bases its decision on the posterior 
probability of Ai given all the data in the network (X™) is the same as the one which bases 
its decision on the posterior given only private data of node 1 (X™). Although this rather 
counter-intuitive result holds asymptotically, the simulations show that even for moderately 
low values of a, having access to extra information in X" does indeed improve performance 
as one expects, (cf. Section [5]). 

Remark 4- The assumptions of bounded likelihood ratios (|| log j^\\oo < M) and geometric 
priors on {Aj} are crucial for our proof technique. The geometric distribution can be relaxed 
to any distribution with exponential tails, but we cannot allow for more heavy-tailed priors. 
A brief explanation is provided after stating Theorem [2] in Section [6l This theorem is a 
key ingredient in our argument and relies heavily on these assumptions. Exponential tails 
assumption is also used in the decoupling Lemma [6l 
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Figure 2: Plots of the slope _ loga E[rg — 0|rg > 0] against — log a for message-passing algo- 
rithm (MP) and SINGLE algorithm which disregards shared information. The graph is the 
star graph of 4 nodes with node 2 in the center. Estimates of both single and paired change 
points (Xj and Xij) are shown together with theoretical limit of Theorem [TJ False alarm 
tolerance a ranges in [0.5, 1CP 13 ]. 



5 Simulations 

We present simulation results as depicted in Fig. [2j The setting is that of graphical model ([3]) 
on d = 4 nodes, where the statistical graph is a star with node 2 in the middle. Conditioned 
on A*, all the data sequences, X*, are assumed Gaussian of variance 1, with pre-change mean 
1 and post-change mean zero. All priors are geometric with parameters pj = 0.1. Fig. [2] 
shows plots of expected delay over |loga|, against |loga|, for two methods: the message- 
passing algorithm of Section [3.11 (MP) and the method which bases its inference on posteriors 
calculated based only on each node's private information (SINGLE). This latter method 
estimates a single change point Xj by Tj := inf{n : P(Xj < ?t|X™) > 1 — a} and a paired 
Xij = XiAXj by TiATj. Also shown in the figure is the limiting value of the normalized expected 
delay as predicted by Theorem [TJ All plots are generated by Monte Carlo simulation over 
5000 realizations. 

In estimating single change points, MP, which takes shared information into account, has 
a clear advantage over SINGLE, for high to relatively low false alarm values (even, say, around 
a ~ e -5 ); though, both methods seem to converge to the same slope in the a — > limit, as 
suggested by Theorem [TJ (The particular value is (—log 0.9 + 0.5) -1 = 1.6519.) Also note 
that the advantage of MP over SINGLE is more emphasized for node 2, as expected by its 
access to shared information from all the three nodes. 

For paired change points, the advantage of MP over SINGLE is more emphasized. It is 
also interesting to note that while MP seems to converge to the expected theoretical limit 
(—2 log 0. 9+3-0. 5) _1 = 0.5845, SINGLE seems to converge to a higher slope (with a reasonable 
guess being 1.6519 as in the case of single change points). 

In regard to false alarm probability, nonzero values were only observed for the first few 
values of a considered here, and those were either below or very close to the specified tolerance. 
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6 Concentration inequalities for marginal likelihood ratios 



In this section, we lay the groundwork for the proof of Theorem [TJ The main result here is 
Theorem [21 which establishes concentration inequalities for various terms that appear in an 
asymptotic expansion of the marginal likelihood ratio, defined in (|17p below. These terms 
(cf. (|23p and (|24p ) are natural by-products of marginalization over a graph and their asymp- 
totic behavior might be of independent interest. 

Our standing assumption throughout is that the graph G = (V, E) is complete. This 
simplifies the arguments without loss of generality, since one can otherwise make the graph 
complete, by assigning sequences of i.i.d. data to each non-edge (with the same pre- and 
post- change distributions). These i.i.d. data do not affect the likelihood (as can be verified 
by examining the representation of Lemma [5]) and they do not contribute to asymptotic delay 
since the corresponding KL informations are zero. 

Fix some delay functional (j) = r§ throughout this section. We use the following notation 
regarding conditional probabilities and expectations 

Pj:=P(- \ (f> = k), Ej:=E(- \</> = k) 
P™* : = P( . | A, = m„), E™* := E( • | A* = m»), 

for k G N and m* = (mi, . . . , md) £ N rf . Here {A* = m*} = n^ =1 {Aj = rrij}. Furthermore, let 

7r^(m*) := P(A* =m*\<f> = k). (16) 

Consider the marginal likelihood ratio 

D t> - D + {x *> - p(x» \ * = ooy (17) 

Our asymptotic analysis hinges on the behavior of ^ logDx(X") as n — > oo, under probability 
measure P^. In particular, as a direct consequences of the results of [18], if one can show that 

Pj[i max . log2?{(X^)>(l + e )/J ^0 (18) 

Liv l<?i<iV J 

for all (small) e > and all k £ N, then the "lower bound" follows, m£rEAs(a) E[r — | r > 
0] > ^^(l + o(l)). Furthermore, let 

T k £ := sup [n e N : ^ log ^(X^- 1 ) < - e}. 
By the results of [IB] , if one has 

oo 

ET* := VP(0 = fc)E^(T e fc ) < oo, (19) 



k=l 



for all (small) e > 0, then the "upper bound" follows, that is, r§ as defined in ([9]) satisfies 
E[r s -0|r s >0]<^(l + O (l)). 

The following lemma provides sufficient conditions based on concentration inequalities 
under conditional probability measures P™*. In the following Eq > is some constant. (See 
Appendix[D]for the proof.) 
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Lemma 4. Assume that for all G N d for which 7r^(m*) > 0, one has 

P^{|^logDj(Xj)-^| >e] <q(n)eM-cme 2 ) (20) 

for all n G N and e G (0,£o) suc/i i/iai -y/n > ^p(m*,k), where p(-) and q(-) are polynomials 
with constant nonnegative coefficients. Furthermore, assume that both ttJ(-) and F(<j> = •) 
have finite polynomial moments. Then both U8\) and M9) hold, hence Theorem [7] holds. 

Remark 1. The condition of finite polynomial moments for vr^(-) and ¥(<p = •) is satisfied for 
a <p = min jG s Xj under geometric priors on {A.,}. 

In order to apply Lemma E] easily, we introduce a notion of "stochastic asymptotic e- 
equivalence" for sequence of random variables. To simplify notation, let supp(7r^) := {m* G 

N d : 7r{(m») > 0}. 

Definition 1. Consider two sequences {a n } and {b n } of random variables, where a n = a n {k) 
and b n = b n {k) could depend on a common parameter k G N. The two sequences are called 
"asymptotically e- equivalent" as n — >• oo, w.r.t. the collection {P™* : m* G supp(7r^)}, and 
denoted 

if there exist polynomials p(-) and q(-) (with constant nonnegative coefficients), and eo > 0, 
such that for all G supp(vr^), we have 

K:(\ a n-b n \<e)>l-q(n)e- c ^ 2 

e 

for all n G N and e G (0,£o) satisfying ^pne > p(m*,k). The one-sided version, e.g, a n ^ b n 
is defined by replacing \a n — b n \ < e with a n < b n + e. (The constants are independent of 
n,m^,k, and e, but they could depend on other parameters of the problem.) 

By application of union bound and algebra, a finite number of asymptotic e-equivalence 
statements can be manipulated under some algebraic rules to produce new such statements. 
Below, we summarize some of the rules: 

(Rl) a n x b n implies a n x b n for C > and aa n x ab n for o£K. 

(R2) a n x b n and b n x c n implies a n x c n . (Transitivity) 

(R3) a n x b n and c n x d n implies a n ± c n x b n ± d n . 

(R4) a n x b n implies max{a n , c n } x max{6 n , c n }. 

(R5) a n x 6 n , c n x 1 and {6 n } bounded implies a n c n x 6 n . 

e £ e 

(R6) a n x a > and b n ^ —b < implies max{a n , b n } x a. 

(R7) "log-sum-max" inequality for positive sequences {a n } and {6 n }: 

n _1 log(a n + b n ) X max{n _1 log a n ,n _1 log6 n }. (21) 
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The last statement follows from inequalities < log(a n + b n ) — maxjlog a n , log b n } < log 2. 
Dividing by n, we observe that the difference is bounded by e, in absolute value, as long as 
tie > log 2. This implies the condition in Definition 1, since {(n,e) : y/ne > log 2} C {(n, e) : 
ne > log 2}. 

As another example of how these rules are obtained, consider (R[3|). We have \a n — b n \ < e 
on event Ai >n having probability at least 1 — q\(n)e~ cine , for \/ne > pi(m*,k). Similarly, 
\b n — c n \ < e on event Ai )n with probability at least 1 — q2{n)e~ C2n£ , for y/ne > p2(m*,k). 
Then, by union bound A\^ n n A2^ n has probability at least 1 — (qi(n) + q2(n))e~( ClAC2 ^ ne , 
for y/ne > Pi(m*,k) + p2{m*,k). For this range of n, on event A\ n D we have both 

I On — b n \ < e and \b n — c n \ < e, from which it follows \a n — c n \ < 2e, by triangle inequality. 
Since both q\ + q2 and pi + p2 are polynomials, we have the desired assertion. 

Remark 1 According to Definition [1] and Lemma [U to prove Theorem [lj it is enough to show 
that 

- log D k f asn^oo w.r.t. {P™* } 

(We often omit m* G supp(7r^) when it is implicitly understood.) The rules stated above 
allows one to reduce the problem to asymptotic e-equivalence statements for simpler terms, 
as considered in the next section. In this context, we regard parameters of the priors, {pj}, 
and pre- and post-change densities as constants. In other words, the constants in the definition 
of e-equivalence can depend on {pj}, {I e }, and M (the uniform norm of log(f e /g e )). 

We now introduce a couple of building blocks occurring frequently and establish >z state- 
ments for them. Recall that f e and g e denote the pre- and post-change densities for edge 
e € E. Define 

n j, n j, 

^(e):=^(X e ):=nf £ (X e ) = n eMXe) ' h e :=\ogf. (22) 
t=k Se t=k Se 

Note that by assumption ||/t e ||oo < M for all e. We will use the convention that empty 
products evaluate to 1, that is, R^(e) = 1 whenever k > n. We also define S'-terms as 

V 

Sr (e) := ^e-^(e) (23) 

p=u 

where A and f3 are some positive constants. Similarly, define M and L-terms as follows 

V V 

M^(e) := E E Ae-^+^R; iAp2 (e) (24) 



pi=U p2=U 



L uU^ : =J2 Ae '^r(e) (25) 

p=u 

for constants A, /3\, 02, > 0. The constants involved in these definitions can be different 
in each occurrence and we have suppressed them in the notation for simplicity. The M and 
L-terms are most relevant when e is a proper edge, that is, e = £ E and i 7^ j, although 

the statements involving them hold in general. 

The following lemma is proved in Section [5J Recall that I e is the KL divergence between 
f e and g e , that is, I e := f f e log ^. 
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Theorem 2. Assume || log ^ ||oo < / or a ^ e £ E. The following asymptotic e-equivalence 
relations hold with respect to {P™* : G supp(7r^)}, as n — > oo, 

ilo gj R™(e) £ ±logS~- n (e) £ ±k>gM~'*( e ) £ \**I%»{e) & I e (26) 

/or any u, r < 2k and e £ E. 

The proof of this theorem is deferred to Section [8j The log e-equivalence — log -R"(e) x I e 
is intuitive as will become clear in the proof. The lemma essentially states that there are no 
surprises regarding S, M and L terms and they are all e-equivalent to the corresponding edge 
information. We also note that 2k in the statement of the Lemma can be replaced with Ck 
for any constant C > 0. 

Remark. Let us consider the role of our assumptions on the priors and likelihood ratios, by 
giving a high-level overview of the proof of Theorem^] for S^°' n (e). The exponential decay for 
the tails of the priors is reflected in the definition of S^' n (e) in in (|23p . The terms Rp(e) 
in this sum are concentrated around I e if p <C n (as in this case Rp(e) is the product of 
many essentially i.i.d. terms). For p close to n, however, there is no guaranteed concentration 
for Rp(e), as it is a product of only a few random variables. For these terms, however, the 
prefactor e~@ p is small while Rp(e) is gauranteed to be bounded (based on || log(/ e /o e )|| 00 < 
M). Hence these terms are a negligible and do not contribute to i log S™' n (e), asymptotically. 
This argument is made precise in Section [3J 

To simplify notation, from now on, we will drop the second upper index in the symbols 
for S, L and M terms, whenever this index is n and there is no chance of confusion. That is, 
we adhere to the following convention, 

Sfte) := S2»(e), L* (r) (e) := L^ r) (e), M^e) := M^(e). (27) 



7 Proof of the optimal delay theorem 

Let us define 



Mj>":=Mj(X?):= £ 7r{(fci, . . . , k d ) JJ R%.{j} JJ i^(e) (28) 



k\,...,k d j£V e<=E 

where k e := k; L A kj for e = and each variable kj runs over {1,2,...} U {oo}. The 

inclusion of oo in range of the summations does not affect the case k < oo, but will allow us 
to use the same expression (|28p for M^' n . We have following easily verified representation of 

D,' n ''. (See Appendix iBl for the proof). 
Lemma 5. With D k ± n defined as in (PT7|) . 

M k ' n 
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We will use the following technical lemma to decouple sums of products. Let (^) denote 
the collection of 2-subsets of [r] = {l,...,r}, with the convention that each member is a 
denoted as an ordered pair with i < j. 

Lemma 6. Let S = Si x S2 x • • • x S r be the Cartesian product of r countable sets Si,...,S r 
and let k = (£4, . . . , k r ) be a multi-index for S. Let Fj and Gy be nonnegative functions 
defined on Sj and Si x Sj respectively, for i,j G [r]. Let Hi be a nonnegative function on Si. 
Let (3 = (/3i, ... , p r ) G Then, 

(a) Y, e-^ kl Fi(ki)Hi(ki) < ( Y e^^F^h)) ( Y e~^l 2 Hi(ki)) . (30) 

(b) Y{ e ^ Tk fl F ^ II Gi^kj)} < (f[{ Y e-^ r Fj(kj)})x 

II { Y e-MW'Gyfakj)} 

The key in this lemma is that the functions Fj, Gij and Hi are nonnegative. One might 
already see how the application of Lemma [6] to the sum in (|28p produces S and M terms as 
introduced in Section [6j We are ready to give the proof of Theorem HJ We start with the two 
extreme change point functionals X§: a single change point (|S| = 1), and the minimum of all 
the change points (S = [d]). Then, we present the proof for Ag with 1 < |S| < d, omitting 
some of the details for brevity. 

7.1 Proof for the case = Ai A • • - A Ad 

First, note that in this case M^' n = 1, since <j> = 00 implies Ai = • • • = A^ = 00. Hence, we 

only need to consider M^' n for some k < 00. We then observe that 7T^(/ei, . . . , kd) is nonzero 
only when at least one of ki, ■ ■ ■ , kd is equal to k. We break up the sum according to how 
many of ki, . . . , kd are equal to k. 

Let J be a subset of [d] of size |J| = s. Let T = [d] \ 3. Consider the terms in the sum (j28|) 



for which kj = k for j G 3 and kj > k for j G 3. We call the sum over these terms Tj. Then, 

M^' ra = Sj-|j|>i^3 = Ss=i Ej-|j|=s w ^ere the sum is over all subsets 3 of [d] of size at 
least 1. 

Let us fixed some s G [d] and some 3 C [d] with \3\ = s. Without loss of generality, we can 
pick 3 = {!,... ,s}. We note that 



TT$(k,...,k,k s+1 ,...,kd)=A Yl ~Pj 

j=s+l 

= Ae~ ^e 3C P jkj , for kj > k,j G 3 C 
where (3j = — log p./ > 0, and A = A({pj}) is some constant. It follows that 

T 3 =i[Rm n Rtie) y { h >: !/ "' n ^.i/} n w}- ^ 

» , ' 
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Here and in the rest of the proof, the index e runs in the set E of original edges (not the 
modified set E introduced in Section UJ. That is, each edge e = {i,j} 6 E for some i ^ j. 
Note that in (f32|) . the rightmost product is over all 2-subset of J c , which we denote as (2). 
We can now apply first part of Lemma [6l with r = \3 C \ = d — s, to obtain 



w < a n { e } n {E e ~^^w^} 



3& c kj>k (ij)e( 3 2 c ) fc »> fc > 



(**) 



(***) 



Each term denoted as (**) is of the form 5'^ 1 {j} and each term denoted as (***) is of the 
form M^j^jz, j}. Hence, we have 

T,<A\{Rt{j} R n k (e) [J S? +1 {j} U MRi(e). 

jeJ |enJ| > 1 jeJ c |ena|=o 

Applying Theorem [2] to each of the R, S and M forms above, we obtain 

iio g T 3 ^/ i+ E / e +E / i+ E ^ 

jG3 |enJ|>l j& c |enJ|=0 

= E h + E J - ( 33 ) 

where the e-equivalence in the above an in what follows is w.r.t. {P™*}- 
To obtain the lower bound, we bound (*) from below by its first term, 

W > ie -(Hi)E 3 ,,cft l[R% +1 {j} II R n k+1 (e) 
je3 c | e nJ|=o 

which, after applying Theorem[2j gives us a lower bound on — log Tj matching the RHS of (|33p . 
Finally, note that the RHS of (|33p does not depend on the particular choice of J. We now use 
the log-sum-max rule (R[7J) to get 



Iio g Mj.» = iio g (£r 3 

3:|J|>1 



>c max 

J:|J|>1 i-n , 

which is the desired result. 

7.2 Proof for the case = Ai 

In this case, one has Tri(kx, . . . , fc^) = l{k\ = k} Y\ d j=2 ^(^jOj hence 

d 

i=2 
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where /3j = — log pj > and A = A({pj}) > is some constant. Then, we can write 

d 

M k f = R n k {l} {Ae-^ft'^[l(^{j}^.{l,i}) J] R U e )}- ( 34 ) 

k2,...,k d j=2 e: l£e 

Note that the second product runs over all 2-subsets of [2 : d] := {2, . . . , d} which we denote 
as (' 2 o^)< Hence, we can apply Lemma [6] with r = d— 1 to obtain 



2 

M 



k,n d 



fct 1 i= 2 (^i)G( [2 2 d] ) fc "^ 

Each term appearing in second product is of the form M^°{i, j}. Applying the second half of 
Lemma [6] to the first product, we get 

Jy£^' n d /3-fc- 0k- 

i^^n{(E e "^^w)(E e "^ T ^.{i,j})} n M n^'i ( 35 ) 

h 1=2 ,J* , a,M l2 -f) 

(*) (**) 

Each term denoted as (*) is of the form S , J > °{j}. For k < oo, each term denoted as (**) can 
be written in the form Lwu{l, j} . That is, 

M fc ' n 01 

<n{s!°{j}L% k) {ij}} n 

3=2 



ARP{1} 

' ' (M')e( [2 2 d] ) 



Applying Theorem [2] to each of the i?, S, L and M forms above, we obtain 



~ logMj'" ^ + + /!,,-)+ £ 7y (36) 

where the e-equivalence in the above an in what follows is w.r.t. {P™*}. 

The lower bound is obtained, as in Section 17.11 by bounding the sum in (j3l|) by its first 
term (i.e., k\ = ki = ■ ■ ■ = kd = 1) 

d 

U k f > ^{lJAe-^AnWai} II Rxihi}- 
Applying ^ log(-) and using Theorem [2] for each term, we get a lower bound matching the 

e e 

RHS of (|36p . That is, the bound in (|36p holds with =^ replaced with x. 

Now consider the denominator of -D^' n , namely M^ 3 '™. An upper bound on M^ D ' n can be 
obtained by letting k = oo in (f35|) . We note that iJ^jl} = 1 and that (**) is now a term of 
the form S^°{l,j}. Proceeding as before, we obtain an upper bound similar to that of (jMj) . 
with I\ missing from the bound. The lower bound is obtained by the same technique. Hence, 

ilogMj" 1 x I>+ J ^)+ £ (37) 



n 



>= 2 ^H [ % d] ) 
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Combining equality form of (|36|) and (|37|) . we have 



1 log = i log Mj' n - i log M~' n k h (38) 



n n n 

which is the desired result. 



7.3 Proof for A s with 1< |S| < d 

We now briefly give the proof for the remaining cases. Without loss of generality, we assume 
S = {1, 2, . . . , r} for some r G {2, . . . , d — 1}. In other words, the delay functional is 4> = Xg = 
Ai A A2 A • • • A A r . We observe that w^(ki, . . . , kd) is nonzero when all of k±, . . . , k r are > k, 

while at least one of them is equal to k. Consider M^'" - for k < 00. As in Section \7.1\ we 
break up the sum in its definition according to how many of k%, . . . , kj, are equal to k. 

Let J be a subset of S = [r] of size |J| = s < r. Let L := S \ 3 and S c := [d] \ S. Note that 
{J, £,S C } form a partition of the index set [d\. To simplify notation, let 3 C = [d] \ 3 and note 
that 3 C = £ U S c . 

Consider the terms in the sum (|28p for which kj = k for j £ 3 and fcj > A: for j £ L. We 
call the sum over these terms Tj. Then, M^' n = YH=i Sa-p|= s ^J- 

Now fix some s S [r] and some J C § with |J| = s. The -R-terms in the expression of 
Tj corresponding to nodes are easy to deal with. For the i?-terms corresponding to edges, 
we first break them into three categories, based on how many of the endpoints are in 3 (i.e., 
|en 3\ = 0, 1, 2). The case where exactly one endpoint is in 3 (i.e., \e(~) 3\ = 1) is further broken 
into two cases based on whether the other endpoint is in L or in S c . The former case, i.e. 
|enJ| = |en£| = 1 behaves the same as the case |e n 3\ = 2. We thus combine these two 
cases, denoted as |e n J| > 1, e C S. To summarize, we break the edges into a total of three 
categories. We get the following decomposition 

Tj=u Rn M n Rn ^ x 

jeJ [enJ|>i, 

eCS 

kj>k,j£L jer | e n J|=i, | e na|=o 
kj>i,jes c s v 'ens c ={f} s v ' 

(*) v -v ' (***) 

(**) 

As in Sections 17.11 and 17.21 we can apply Lemma [6] to decouple the sum and obtain an upper 
bound on Tj. The products denoted by (*), (**) and (***) produce S, L, and M-term£], 

1 Strictly speaking, some of the terms produced by (***) will have the form of an M-term in the extended 
sense to be introduced in (|44[1 . For example, we will have M-terms of the from M^ D ^ 1 (e) . Since every term of 
the sum is nonnegative, we have the inequality M^_ 1 (e) < M^'^^e) < Mi°(e), which in view of Theorem [2] 

implies i log M^+i ( e ) ^ J <=- 
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respectively. Using the same lower bounding technique and applying Theorem [21 we obtain 

iio g r a £J>+ E + E ^ + E ^ 

jGJ |ena|>i, j&T e n J|=l, |e n J|=0 

eCS e n S c |=l 

Since this expression does not depend on J, using log-sum-max rule as before, we obtain 

that ilogMj'" x E,w^ + E eeS ^. 

Now, we need to analyze M^ 3 '™. We try to break up the sum as before into Tj terms 

(defined similar to Tj for M/ n ). This time however, we only need to consider J = S (and 

L the empty set), because (j) = oo implies Aj = oo for all j G §. The expansion for T§ can 
be obtained from (139j) by setting k = oo and removing the terms corresponding to indices in 
S = JU£, 

M ^,n = fg= ^ {Ae-^^H R n j{j} Yl Rl{e) Yl Rl(e)}. 

kj>l,j£S c J'GS C |enS|=l, |enS|=0 

e ns c ={e} 

It follows that 

ilogM-^ + E 'e+ E 

ies c | e ns|=i, |ens|=o 
|ens c |=i 

The last two sums can be described as the sum over all edges e : e n S c / 0. Putting the 
pieces together, we have 

jgV eG-E 

= E^ + E^ 

j'GS e c s 

as desired. 

8 Proof of Theorem [2] 

Let us start by understanding the asymptotic behavior of Mogi?^(e). Throughout, we fix 
e £ E. We either have e = {j} in which case A e = Xj, or e = {i, j} in which case A e = Aj A Aj. 
Recall that = (mi, . . . ,m^) G N rf is a multi-index, and we will work under the collection 
jpm.j Q £ con( iitional distributions (see Definition Q] for details). The same convention is used 
regarding the meaning of m e , that is, m e = mj for e = j, and m e = rrii A rrij for e = 
We also fix some fc£N, which is the parameter k appearing in Definition [T] (reserved for the 
ultimate conditioning on {(j) = k}). Finally, we always assume e G (0, 1). 
At first, we need to be careful about whether u < m e or u > m e . 



-(E'*+ E *.) 

jes c ens c ^0 
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Lemma 7. Let u G [n] and assume u > m e . Then, 



A, 



1 



n — u + 1 



loft/?: ( '(0-/, -r) £2ox|, ~ (n ^ 1)g (10) 



Proof. Since rn e < tt, conditioned on P™*, Xg,X" + , . . . are i.i.d. from f e . Recalling defini- 
tion ([22]) . logi?"(e) = ^"^/i^Xg) which is a sum of (n — u + 1) i.i.d. bounded variables 
h e (Xl) G [-M,M] with mean Ef e h e (X%) = I e . The result then follows from Hoeffding 
inequality. □ 

Before moving on, we need an extension of Definition [1] We need to deal with intermediate 
sequences whose terms depend possibly on m* (in addition to fe). There is nothing to preclude 
such dependence in Definition [1] Hence, we use the same definition for e-equivalence of such 
sequences with respect to the collection {P™*}. Note that for any u, v G N, we can write 

R n u {e) = K-\e)R n uyu {e) (41) 
which holds irrespective of whether u>voiu<u. 

Lemma 8. For any u £ [k], ^ logi?™ Vme (e) x I e asn->oo with respect to {P™*} 
Lemma 9. For any u G N, ^ logi?™ e_1 (e) xO as n — >• oo with respect to {P™*}. 

Lemma 10. For any « 6 [fc], ^ logi?™(e) xJ e as n — > oo u>ii/i respect to {P™*}. 

The last lemma proves the statement in Theorem [2J regarding asymptotic behavior of 
±lo gj R"(e) for u G [fe]. 

Proof of Lemma 0. Apply Lemma [7] with « replaced with u V m e . Since e < 1 and u V m e < 
fe V m e , the RHS of (|40p is further bounded above by 

/(feVm e )e\ / ne 2 \ / ne 2 \ 

as long as (fe Vm e )e < ne 2 /2 or equivalently ne > 2(fe Vm e ). (This same condition guarantees 
u V m e G [n] justifying application of Lemma [71) The condition obtained is of the form 
required by Definition (H since 2(fe V m e ) is bounded above by a polynomial, say 2(fe + mi) if 
e = This shows that 

' \ogRl Wme (e)kl e w.r.t.{P™;}. 



n-uV m e + 1 



Now, note that I n-uvm e +i _ i| < «ym^i < Wm, whi h b mad < e by choos . 

n — n — n — J 

n-iiVme+1 6 



ing ne > (fe V m e ). This implies that n uV ™ e+L x l. Applying rule (REJ, with a n 
l 

n— itVm e + 



j- logii^ Vjn(! (e), 6„ = J e and c n = n uV ™* +1 ; we obtain the desired result. □ 



Proof of Lemma\Q If u > m e — 1, we have by definition i?™ e 1 (e) = 1 and there is nothing 
to show. Otherwise, by boundedness assumption ||/i||oo < M, we have 

e —Mm e < g— Af(m e — u) < R mE ~ l (e) < e M ( m ^~ u ) < e Mm e 
Hence, by taking ne > Mm e , we have | — log i?™ e_1 (e)| < e, which implies the result. □ 
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Proof of Lemma[TU. Apply (|4ip with v = m e , to obtain 



-log^(e) = -logiC _1 (e) + -logiCv me (e). 
n n n 

The result now follows from Lemmas [8] and [9J and rule (E[3]) . 



□ 



8.1 Bounding S'-terms 

Bounding S'-terms is perhaps the most elaborate part of the proof. We start with a uni- 
formization of Lemma [7] and then proceed in steps, working on various parts of the sum 
( e ) := <Su°' n (e) one at a time. Up to Lemma [TP] we will use the shorthand notation in- 
troduced in (|27|) . with n superscript dropped. It might help to recall that in this notation, u 
and oo are the initial and final indices of the sum, respectively. Also, the edge e 6 E is fixed 
throughout. 

Lemma 11. Let «GN and a G (0, 1) such that m e < u < \_an\ . Then, 

1 



sup 

u < p < [an] 



n — p + 



jlogR;(e) 



< e 



with F™* -probability at least 1 - 2([an\ - u + 1) exp[- £2((1 2j g } " +1) ]. 
Lemma 12. Lei u £ [A;] and a G (0, 1) such that m e < u < [an\ . Then 

1 — a 



A* 



ilog4 Q " J (e) 



< 2e > 1 - 2nexp 



2M 2 



■ne 



/or ne > c$k and e G (0, 1). 

Lemma 13. Let 5 G (0, 1) and a = . Then, for n > n (A, /3, M,S), 



n 



Lemma 14. Let a 



[+2 7f and u G [A;]. Then, -logS^v^J ( e ) as n-^ oo w.r.i. {P™*}. 



Lemma 15. For any u G [At], we Ziawe ^log5" Vme (e) x J e as n —> oo w.r.i. {P™*}. 

Proof of Lemma [77]. We note that for any p = u, u + 1, . . . , [cm] , Lemma [7] applies. We can 
further upper-bound the RHS of ([30]) by 



2 exp 



(n — p + l)e 2 



2M 2 



< 2 exp 



(n — an + l)e z 
2M2 



The result follows by applying union bound. 

Proof of LemmaUM By Lemma [TTT uniformly over p = u, . . . , [_an\ , we have 

e (n-p+l)(ie-e) < #™( e ) < e (n-p+i)(J e -H) 



□ 



(42) 
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with F™* -probability at least 1 — 2n exp(— ^ 2 ^j" £ )■ (Note that this is a further lower bound 
w.r.t. that of Lemma [TT]) On the event that (|42p holds, we have 

[an\ 

Sl anl (e) < ^ Ae- pp ^ n - p+1 ^ h+ ^ 

p=u 

00 4„(n+l)(/ e +e) 
p=l 

Take Ci := max{0,log 0+^ e _ 1 }- Then, 

- log (e) < ^ + ^(/ e + e) < I e + 2e 
n n n 

as long as ne > C\ + I e + 1 (and e < 1). To get the lower bound, we note that (I42p implies 

p=u 

> _4 e (n+l)(/ e -fl) e -09+/ B )« 
where we have lower bounded a sum of nonnegative terms by its first term. Hence, 
ilogSjr J (e) > H±i(I e - e ) - |l°64 + (£ + 4> 

77-72 77 

n 

as long as ne > 1 + | log A\ + (/3 + I e )&. □ 

Proof of Lemma\13[ By boundedness assumption ||/i||oo < ^\ we have -Rp(e) < e ( n -p+ 1 ) M as 
long as p < n. Hence, 

n 

S[ anl+1 (e)< £ ite-A'e^ 1 ^ 

n 

p=\an\+l 

< ^ e (n+l)M ( n _ an + 1)e -(/3+M)an 

where we have used |_anj > cm — 1- Taking a to be as stated and noting that 1 — a £ (0, 1), 
we get 

I S ;. „ J+lW < + ILti „ + '° g « 1 - a >" + 1 » - M - v < -U 

n l«"jt-^ ra n n 2 

as long as n > no (A M, (5) for some no large enough. □ 
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Proof of Lemma\TJ\ Apply Lemma [121 with u replaced with uVm e . To ensure «Vm e < \_an\, 
let n>i(fcVm e | 1). To ensure that the bound of Lemma [T2l holds, let ne > cok. Since, 
these two conditions are met if ne > —{k + m e ) + cok, the result follows. (Note also that 
is a positive constant by our choice of a.) □ 

Proof of Lemma[T5[ Let a = -jgqrjf an d as in Lemma HH assume n > ^(/c V m e + 1) so that 
u V m e < [an\ . (This is just to make sure that sums ranging from u V m e to [an\ are not 
vacuous.) By Lemma \12\ we have 

^logS[ anJ+1 (eH-i/3 



and by Lemma [Til i log <S'uvm e ( e ) ^e- Now, we can break up the sum and use log-sum- max 
rule (BCD, 

ilog^ Vme (e) = ilog (slTi(e) + Sf an]+1 (e)) 

-log^ivmeCe), ^■log5j 1 an j +1 (e) } x J e 



n n 



where the last x follows from rule (Rt6j) . □ 

The next step is to move from S™ Wrne (e) to S™(e). We need a couple of lemmas. To 
simplify notation, throughout this section, let 

£ := £(u, m e ) := u V m e . (43) 

We occasionally drop the dependence of £ on u and m e (although this is implicitly assumed). 
We note that all the lemmas established so far in this section hold, if we replace [k] in their 
statements with [2k] (or any other constant multiple of k). For the rest of this subsection, we 
will use the full superscript notation Su' n (e) introduced in ([23]) . 

Lemma 16. For [2fc], we have — logS^~J (e) xO as n —)• oo tu.r.f. {P™*}. 

Lemma 17. For [2k], we have ± log 5^l|' n (e) X J e osn-^oo w.r.f. {P™*}. 

Lemma 18. For [2/c], u>e /iaue ^ log S^^i ( e ) ^ as n oo w.r.i. {P™*}. 

Lemma 19. For u G [A;], we /iaue ^ log S2°' n (e) x J e as n — > oo w/.r.i. {P™*}. 

The last lemma completes the proof of the statement in Theorem [2] regarding the S terms. 

Proof of Lemma [Tb\ For p < £ — 1, e~^ [ ^~ p " 1 < Rp~ 1 (e) < e M ^~ p \ by boundedness assump- 
tion. Hence, we have 

05-1,5- V \ < V Ap-PPpM(.t-p) < Ae 

p=u— 1 
p=u— 1 
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Let Ci = | log yz^Us+m) I and °2 = I \og(AeP~ M )\. We have 

< -log^^ (e) < — + 

n n n n n 

where we have used £ < (2k) V m e which follows from definition 03] and assumption u G [2fc]. 
It follows that |Mog5^' e_1 (e)| < s if we take ne > C*3(A: + m e ), proving the result. □ 



Proof of Lemma \T1\ For any p G {u — 1, . . . , £ — 1}, we have by ([¥!]) . 

fl£(e) = ^(^(e). 
It follows from the definition of S* term that 

St\ n (e) = Bq{e) Ae-^Rl-\e) = R n i {e)StY-\e). 

p=u—l 

The result now follows from Lemmas and [TBI D 

Proof of Lemma] 181 We have S^ll^e) = S^Z\' n (e) + S^' n (e). The result now follows form 
Lemmas [T7] and [T5| and log-sum-max rule (R[7|) . □ 

Note that since [k + 1] C [2k], it follows that ^logS'u' n (e) x / e for all it E [A;]. The final 
step is to move from 5"' n (e) to S™' n (e). 

Proof of Lemma\TR We have S%° ,n (e) = Su' n (e) + #^"(e). Since Rp(e) = 1 for all p > n (by 
convention), we have 

p=n+l 

It follows that ±logS^(e) x -/3. Then, by rules (BE} and (Eg}, 

i log 5S°' B (e) x max { 1 log S?" (e) , - log S£?(e)} 
x max{J e , -0} = I e 

where we have used Lemma [THl □ 
8.2 Bounding M-terms 

With some work, we can reduce bounding M-terms to that of bounding R and S'-terms. 
Lemma 20. For u G [k], we have M"(e) x I e as n -> oo u>.r.£ {P™*}. 
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Proof. Let n > k, so that the sums are not vacuous. For q E {u, . . . , n} the cardinality of the 
set {(pi,P2) '■ u < pi,P2 < n, pi Ap2 = q} is 2(n — q) + 1. Hence, 

n n 

M£(e) < £ £ Ae-^^^R n piAp2 (e) 

Pl=up 2 =U 
n 

= A^2[2(n -q) + l]e -(AAA) «R£(e) 

n 

< n 



q=u 



Note that this last sum is of the form S™(e). For the lower bound, we use the first term of 
the sum, M"(e) > Ae-^ 1+ ^ u R^(e). Since u < k, we have 

|Io BJ ») + (A+A)* + i < L logM ; (e) < !^ + I logS>) . 



n n n n n 

The only new term (with respect to what established earlier) is log n/n which is >z 0. This 
can be seen by noting that | log n/n \ < e if y/ne > 1. The result now follows from Lemmas [TOl 
and EH □ 

To move from M"(e) to M^°(e), we introduce the following extended notation 

Kfe) ■■= <f n (e) ■= E E ^- ( ^ ipi+ ^ 2) ^ lAp2 (e) (44) 

Pi=a-P2=6 

so that M£°(e) = M^°°(e). 

Lemma 21. For u G [A;], we /iaue M^°(e) x J e as n — > oo w.r.t. {P™*}. 

Proof. Let n > k. The strategy is to break up the sum as 

M£T (e) = M^(e) + M^e) + M^ +1 (e) + ^ +1 (e) (45) 

and then apply the log-sum-max rule (R[7]). The first term is taken care of by Lemma [20l For 
the second term, we have 



oo n 



JCi,«(e)= E E Ae-^^R% 2 (e) 

pi=ra+l P2=u 

Applying Lemma [18] we get ±logM££ (e) x -ft + J e . The third term in U5] is similar. 
Recalling that R£(e) = 1 for p > n, the fourth term, M^^ n+1 (e), is equal to C 2 e- (/3l+fti)(n+1) . 
Hence, by (RED, 

- log JC°°(e) x max{/ e , -ft + / e , -/3 2 + J e , -ft - ft} = J e . 
n 

□ 
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8.3 Bounding L-terms 

Lemma 22. For u,r£ [A;], we have ^logL^ ^ >c I e as n — >■ oo «;.r.t {IP™*}. 
Proof. First consider the case u > r. Then, we have 

oo 
p=u 

Since |^| < we have ^ log(Cie~^ u ) x 0. The result now follows from Lemma [TU1 For the 
case u < r, we have 

r— 1 oo 

= E Ae-^R n p (e) + £ Af^e) 

= < S;- 1 (e) + C7 1 e- /3r ^(e). (46) 

Let n > k so that n > r. Note that Ae~ u ^R^(e) < S^e) < 5™(e). It follows from 

Lemmas EES and QUI and ± \og(Ae~ Pu ) ik that ± log SS _1 (e) x J e . Applying rule (BED to ([46]) 
and using a similar argument for the second term, we get the result. □ 

9 Conclusion 

We have introduced a graphical model framework which allows for modeling and detection of 
multiple change points in networks. Within this framework, we proposed stopping rules for 
the detection of change points and particular functionals of them (the minimum over a subset), 
based on thresholding the posterior probabilities. A message passing algorithm for efficient 
computation of these posteriors was derived. It was also shown that the proposed rules are 
asymptotically optimal in terms of their expected delay, within the Bayesian framework. 

Let us discuss some directions for possible extension of this work. The assumption that the 
distribution of shared (edge) information between two nodes only depends on the minimum of 
the associated change points (cf. discussion after equation ([3])) might be restrictive in practice. 
The current assumption simplifies the analysis in many places and it has an impact on the 
asymptotic delay. For example, we suspect that the "no gain" phenomenon in asymptotic 
delay for detection of a single change point, discussed in Remark 3 after Theorem [H is due to 
this rather simplistic assumption. It will be interesting to be able to extend the analysis to 
a model which allows for a more general dependence on the two change points. At present, 
however, we do not know how much of our analysis can be carried over to the general case. 

It is possible to derive an approximate message passing algorithm with computational cost 
scaling as 0(|V| + \E\) for each time step n. That is, the computational cost is constant in 
time n. Simulations indicate that this fast algorithm approximates the exact message passing 
well. The presentation of the algorithm and its theoretical analysis will be deferred to a future 
publication. 

As was discussed in the remarks after Theorems [T] and [21 the assumptions on the likelihood 
ratio, i.e., the boundedness, and the priors, i.e., exponential tail decay are crucial to our proof. 
They seem to strike the right balance between the prior and the likelihood and they also allow 
for the break-up of the analysis of the rather complicated likelihood ratios (cf. (|28p ) into 
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simpler pieces. This is in contrast to the more classical case of a single change point where 
the analysis goes through seamlessly, say, irrespective of the tail behavior of the priors |18j . 
Whether these limitations are genuinely present in the multiple change point model or are 
artifacts of the proof technique is not clear at this point. 

Finally, although our main focus in this paper was on the Bayesian formulation, we note 
that there are non-Bayesian optimality criteria for the single-change point problem, e.g., the 
minimax as considered in [19]. It is an interesting question whether one can derive minimax 
optimal rules for the model we consider here. 

A Proof of Lemma [2] 

Consider, for example, node %\ and let j be one of its neighbors in G, i.e. {h,j} E E. Let 
k x > n + 1. Then P(X?J Aj, = fci) = IltUffixP^) = P (^l\K = n+1). Similarly, the 
distribution of X^j given A^ = ki and \j is independent of the particular value of k\, that is, 

P(X^.|A n = fei, Xj) = P(Xy (n + 1) A Aj) = PpqjXi, = n + 1, Xj). 

Let 3 := {ii, . . . , i r } and 3 C = [d] \ 3. Pick kj > n + 1 for j E 3. Then, the argument above 
applied to each node in 3 shows that 

P(X? | Xj = k j: je3)=J2 P(K I A*) P(A* | Xj = kj,j e 3) 

A* 

= ^ P(x:\X e ,£e3 c , X j = k J ,j e3) P(X e ,£e3 c ) 

Xe,£ G 3 C 

= ]T P(X"|A,,^eJ c , A i = n + l,jeJ)P(A,,£E J c ) 

where the second inequality follows by independence of {Aj} a priori. As the last expression 
does not depend on {kj}, the proof is complete. 

B Proof of Lemma [5] 

Let A;* = (k±, . . . , kj) E N d be a multi-index. We have 

P(X? |0 = A;)= ^ P(X? | A* = fc*)P(A, =k*\<j> = k) 

fc*GN d 

= E { II P ( X e I Ae = 

fe.GN 1 * e G -E 

where we have used the extended edge notation of Section |4] and conditional distribution 
introduced in (fT6|) . Using the pre- and post-change densities, we get 

k e — 1 n 

p(x: 1 = k) = 53 { n [ n ^(^) n /e^)] }*$(*•) ( 47 ) 

fc*GN d eg £ t=l t=k e 



25 



where by convention, empty products are equal to 1. Dividing (|%7|) by 

n 

urn :=n[n^)"- 

we obtain 

F(x jltr fc> = e { n «iPwH(*o=*!f 

where we have used definitions (122p and (128p . The same expression holds, if we replace A; with 
oo. The result now follows from definition (1171) of D ,' n . 



C Proof of Lemma [6] 

The idea of the proof is to write the sum as the diagonal part of a higher dimensional one 
and then drop the restriction to the diagonal. Let us illustrate the idea first by proving (130p . 
We can write 

E e-^ p F 1 (p)H 1 (p) =Y J Y, l ^ = q}e-^ ip+q) F 1 (p)H 1 (q) 
peSi peSi q&Si 

<EE e-^)F 1 {p)H 1 {q) 

The bound holds since the terms are nonnegative. Now, the RHS factors over p and q and we 
get 

The idea for the proof of (|3ip is similar. For every pair £ (2 )1 we introduce new 

versions of fcj and fcj so that the corresponding term Gij(ki,kj) involves the new variables. 
To be more precise, let % = {y\, . . . , vr} be an enumeration of the elements of ( To each 
element vi = £ % with i < j, associate variables uj and uf, representing newer versions 
of ki and kj. In other words, u\ is the new version of k ut m. 

This procedures introduces 2K extra variables. To each of the original ki variables, there 
corresponds exactly r — 1 new versions. Letting 



J =1 (*,i)e(M) 



denote the LHS of ([3T]) . we have 



{(uj.uf)} 

-p [- e ^ - e + ^ 2 )] n^(^) n ^ 
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where the summation is over {kj G Sj, u\ G Swi)j v% £ 5^(2)) J G [ r h ^ £ [-^ll- Dropping 
the indicator, we get an upper bound which separates 

T < £ {e-£, (^ + ^.f) nF . (fc . } JJ Gl/<(fl i, tt2 )} 

{**},{(«*,«?)} J ^ 6JC 

=n{E^ft)}n{ e e-(^i^)^(«},^)} 

3 ( u f, u |) 

which is the desired result. 



D Proof of Lemma [4] 

(a) We start by proving (|18|) . Pick no large enough so that for n > no, we have q(n) < Cn a 
for some numerical constant a G N. Fix some A; G N throughout the proof. For now, fix 

m* G N d such that 7r^(m*) > 0. Pick e n := y^ 7 ^" ^ £ o for some 7 to be determined shortly. 

Note that y/n > j^p(m*,k) is equivalent to y^j- log n A (eov^) — p(m*,k), which holds for 
sufficiently large n. Let m := ni (m*, /c) be the smallest n for which this inequality holds. 
Using the shorthand notation D k ^ n = jD^(X"), we have for n > max{no,ni}, 

> < Cn a exp(-cin e 2) = Cn a " 7 

Taking 7 = a + 2, we have by Borel-Cantelli lemma that P™* { ^ log D^' n J^} = 1. it 

follows that the sequence logrj^' fc+n } n has the same limit a.s. P™*- Since fc is fixed for 

now, ^ ~ 1 as n -> 00, hence P™*{i log^' fc+n n -^° I4 = 1. 

We now take the average with respect to conditioned distribution of A* given (p = k. That 
is, we multiply by 7^(771*) and sum over m* to obtain 

n{\ l ^ D T +nn ^h} = ^- (48) 

For any sequence of number {6 n } ne N ; ^b n n ^S? 5 implies thatH-^ maxi< n <7v b n N -^° b. Thus, 
it follows from (flBj) that 

Since convergence a.s. implies convergence in probability, this implies ()18p . 

2 Here is the proof. Fix e € (0, 6/2) and pick no so that for n > rto, \ — b n —b\ < e. Let Bp := i max p <„< ? fe n . 
We have Bf = max{B™ 0_1 , B^ }. We can pick jV such that for all TV > No, 73™ " 1 < e. On the other hand, 
n(b — e) < b n < n(b + e), for n € [no,N]. Taking the maximum of each side over this interval, we obtain 
N(b - e) < NB% < N(b + e). Since 2e < b, we have Bf = 73^ and \B^ Q - b\ < e which implies the result. 



k;{\-^d^-i, 
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(b) To prove (fl9|) . let us fix e G (0, £q) throughout. Changing n to n — k+1 in the definition 
of T e fc , we obtain 

T fc = -A; + 1 + sup {n> k : ]-— log L>J<" < 1+ - e} 

n—k+1 9 

1 , ^k,n n — k + 1 



= —k + 1 + sup {n > k : -logD«' n < (X, - e)} 

< -fc + 1 + sup{n > 1 : - logD k ,' n < 1+ - e} . 



n 



s 

Thus, it is enough to verify (|19p for in place of T*. 

Let y fe ' n — -log-D^'™. For to* G N d , let n-i '■= n2(k,m;e) be the smallest integer n that 
satisfies \fn > ^p(m*,k), that is 



n 2 (fc, m;e) := r-rf> 2 (m*, fe)l < 4rp 2 (m*, fc) + 1. (50) 

Let no be as in the previous part. By assumption, for all n > max{n2,no}, we have 
P^*{|y n,fc — > e} < Cn a exp(— ne 2 ). To simplify notation, we will assume uq = 1 without 
loss of generality. We have 

oo 

K: K k ] = £ pr; @ > *) = £ p t ( U { yn,fc < J * - e >) 

£=1 £>1 n>£ 

£>1 n>£ 
oo 

= £(n-l)P™*(y n ' fc < J^-e) 

71=1 
712— 1 

< £ (n - 1) + £ Cn a e~ n£2 

n=l n>n,2 

< (n2 " 1)2 +f;Cn a e- 2 . 



n=l 



The second term on the RHS does not depend on m* or k, and we can denote it as C\(e). 
Using the bound ([50]) on ri2, we have 



[T e k ] 4E ^(m*)p 4 (m # ,fc) + Cx( £ ). 



2e 4 

m,€N 



Since by assumption, both 7n(-) and P(<^> = •) have finite polynomial moments, it follows 
that (EL9D holds for ?* 
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